Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance

نویسندگان

  • Martin Sosic
  • Mile Sikic
چکیده

Summary We present Edlib, an open-source C/C ++ library for exact pairwise sequence alignment using edit distance. We compare Edlib to other libraries and show that it is the fastest while not lacking in functionality and can also easily handle very large sequences. Being easy to use, flexible, fast and low on memory usage, we expect it to be easily adopted as a building block for future bioinformatics tools. Availability and Implementation Source code, installation instructions and test data are freely available for download at https://github.com/Martinsos/edlib, under the MIT licence. Edlib is implemented in C/C ++ and supported on Linux, MS Windows, and Mac OS. Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SlideSort: all pairs similarity search for short reads

MOTIVATION Recent progress in DNA sequencing technologies calls for fast and accurate algorithms that can evaluate sequence similarity for a huge amount of short reads. Searching similar pairs from a string pool is a fundamental process of de novo genome assembly, genome-wide alignment and other important analyses. RESULTS In this study, we designed and implemented an exact algorithm SlideSor...

متن کامل

Sequence analysis Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping

Motivation: Calculating the edit-distance (i.e. minimum number of insertions, deletions and substitutions) between short DNA sequences is the primary task performed by seed-and-extend based mappers, which compare billions of sequences. In practice, only sequence pairs with a small editdistance provide useful scientific data. However, the majority of sequence pairs analyzed by seedand-extend bas...

متن کامل

Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping

MOTIVATION Calculating the edit-distance (i.e. minimum number of insertions, deletions and substitutions) between short DNA sequences is the primary task performed by seed-and-extend based mappers, which compare billions of sequences. In practice, only sequence pairs with a small edit-distance provide useful scientific data. However, the majority of sequence pairs analyzed by seed-and-extend ba...

متن کامل

Fast Similarity Searches and Similarity Joins in Oracle DB

Similarity search and similarity join on strings are important operations for applications such as duplicate detection, error detection, data cleansing, or comparison of biological sequences [GIJ+01, NMS04]. Especially DNA sequencing produces large collections of erroneous strings which need to be searched, compared, and merged. In our talk, we will use ESTs as our running example. ESTs (Expres...

متن کامل

A Systolic Array for the Sequence Alignment Problem

This report introduces a new systolic algorithm for the sequence alignment problem. This work builds upon an existing systolic array for computing the edit distance between two sequences. The alignment array is meant to be used as the second phase in a two-phase design with a modiied edit distance array serving as the rst phase. An implementation on the SPLASH programmable logic array is descri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 33  شماره 

صفحات  -

تاریخ انتشار 2017